National Repository of Grey Literature 13 records found  1 - 10next  jump to record: Search took 0.00 seconds. 
Intelligent Data Scraping in a Web Browser
Maštera, František ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The goal of this thesis is to extract data from web pages without the knowledge of their internal structure. The point is to recognize the structure using an algorithm and a given input information about the content that the user wants to extract. The structure analysis is then followed by the content extraction itself. An average success rate of over 80% was achieved on selected sets of websites. The resulting algorithm represents a new approach to data extraction and can be deployed in the real world or can be a part of further development.
Image object detection using template
Novák, Pavel ; Mašek, Jan (referee) ; Burget, Radim (advisor)
This Thesis is focused to Image Object Detection using Template. Main Benefit of this Work is a new Method for sympthoms extraction from Histogram of Oriented Gradients using set of Comparators. In this used Work Methods of Image comparing and Sympthoms extraction are described. Main Part is given to Histogram of Oriented Gradients Method. We came out from this Method. In this Work is used small training Data Set (100 pcs.) verified by X-Validation, followed by tests on real Sceneries. Achieved success Rate using X-Validation is 98%. for SVM Algorithm.
Environment for analyzing suspicious device
Procházka, Jan ; Martinásek, Zdeněk (referee) ; Malina, Lukáš (advisor)
This bachelor thesis focuses on a design of enviroment for analysis of a suspicious device. Such device may be for example a disc contaminated by malicious code or a mobile device. The aim of this work is to design an efficient and simple solution using open source products. The final designed environment should be capable of performing both surface and in-depth data analysis. The theoretical part offers an information related to the scope of addressed problem and includes terms such as Sandbox, Malware, Android. These are described from the point of view of understanding the analysis of malware occurring predominantly on mobile devices. The practical part describes the used hardware and software for the design of the environment and it contains examples of analyzes of the external devices contaminated by a malcode. These examples are mainly for Android mobile devices.
Extrakcia informácií z formulárov
Pálinkás, Adam
This thesis is discussing designing and implementing application which is using advanced text recognition techniques and image processing techniques for processing scanned forms forms which were filled in by hand. Existing methods and techniques for text recognition are being analyzed. Chosen methods and techniques are implemented to create the final solution that streamlines form processing in CYRRUS, a. s.
Data Extraction from PDF Documents
Bartošák, Michal ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The work focuses on extracting information from medical records saved in PDF format, which were created by heart pacemakers during regular patient monitoring in the hospital. The result of this work is a desktop application written in Java that retrieves and analyzes data from records using PDFBox and pdf2dom libraries. The output of the application is a CSV file, which represents the acquired values in table form, as well as extracted images that are saved to a user-defined output folder. Application testing on records from three different companies proved that record extraction is highly reliable (with overall precision and recall metrics reaching almost 100 % in every test), provided that the application arguments are correctly set.
Intelligent Data Scraping in a Web Browser
Maštera, František ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The goal of this thesis is to extract data from web pages without the knowledge of their internal structure. The point is to recognize the structure using an algorithm and a given input information about the content that the user wants to extract. The structure analysis is then followed by the content extraction itself. An average success rate of over 80% was achieved on selected sets of websites. The resulting algorithm represents a new approach to data extraction and can be deployed in the real world or can be a part of further development.
Environment for analyzing suspicious device
Procházka, Jan ; Martinásek, Zdeněk (referee) ; Malina, Lukáš (advisor)
This bachelor thesis focuses on a design of enviroment for analysis of a suspicious device. Such device may be for example a disc contaminated by malicious code or a mobile device. The aim of this work is to design an efficient and simple solution using open source products. The final designed environment should be capable of performing both surface and in-depth data analysis. The theoretical part offers an information related to the scope of addressed problem and includes terms such as Sandbox, Malware, Android. These are described from the point of view of understanding the analysis of malware occurring predominantly on mobile devices. The practical part describes the used hardware and software for the design of the environment and it contains examples of analyzes of the external devices contaminated by a malcode. These examples are mainly for Android mobile devices.
Analýza spolehlivosti forenzních nástrojů pro zkoumání malé digitální techniky
PĚSTOVÁ, Karolína
This bachelor thesis deals with forensic tools for investigating small digital devices. In theoretical part are described principles in digital forensic analysis and approaches for examining mobile phones. In practical part are analysed selected mobile phones with tools for investigating small digital devices. In this part are evaluated results and proposed a solution for acquiring the most relevant data.
Design and Implementation of System for Aggregations of Real Estate Offers in the Czech Republic
Drobník, Jakub ; Kučera, Jan (advisor) ; Chlapek, Dušan (referee)
The diploma thesis deals with the design and implementation of software for aggregations of real estate offers in the Czech Republic. The aim of the thesis is to create a system which aggregates the data of real estate offers from web pages. This thesis consists of two basic parts. The context of creating the system is described in the first part. The author discusses ways to retrieve data from websites - especially the extraction of data using automated robots - in the first part of the thesis. The design and implementation of the system are described in the second part. The author and sponsor define requirements for the system in the second part of the thesis. The outcome of this thesis is a prototype that aggregates data from real estate portals into the prepared database. The main contribution of the thesis is an example of a possible approach that can aggregate data from a particular market segment and put it into the database.
Web page data figure finder
Janata, Dominik ; Vojtáš, Peter (advisor) ; Nečaský, Martin (referee)
The thesis treats automatic extraction of semantic data from Web pages. Within this broad problem, it focuses on finding values of data figures within the page presenting certain entity (e.g. price of a laptop). The main idea we wanted to evaluate is that a figure can be found using its context in the page: the words that surround it and values of the attributes of the containing HTML tags, class attribute in particular. Our research revealed there are two types of contemporary solutions of this problem: either the author of the Web page must inline semantic information inside the markup of the page or there are commercial tools that can be trained to parse a particular page format (targetting pages from a single Web domain). We examined the possibilities of developing a general solution that would - for given entity - find its properties across the Web domains using text analysis and machine learning. The naïve algorithm had about 30% accuracy, the lear- ning algorithms had the accuracy between 40 and 50% in finding the properties. Despite the accuracy is not acceptable for a final solution, we believe it confirms the potential of the idea. Keywords: Web pages data extraction 1

National Repository of Grey Literature : 13 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.